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Abstract. We discuss the problem of proteasomal degradation of proteins. Though 
proteasomes are important for all aspects of the cellular metabolism, some details of 
the physical mechanism of the process remain unknown. We introduce a stochastic 
model of the proteasomal degradation of proteins, which accounts for the protein 
translocation and the topology of the positioning of cleavage centers of a proteasome 
from first principles. For this model we develop the mathematical description based 
on a master-equation and techniques for reconstruction of the cleavage specificity 
inherent to proteins and the proteasomal translocation rates, which are a property 
of the proteasome specie, from mass spectroscopy data on digestion patterns. With 
these properties determined, one can quantitatively predict digestion patterns for new 
experimental set-ups. Additionally we design an experimental set-up for a synthetic 
polypeptide with a periodic sequence of amino acids, which enables especially reliable 
determination of translocation rates. 
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A macromolecular complex, the proteasome, is the complex molecular machine for 
the degradation of intracellular proteins [1] . In particular, proteasomes produce epitopes 
for an immune system [2]. They exist in cells as the free proteolytically active core, the 
barrel-shaped 20S proteasome (figured]), and as associations of this core with regulatory 
complexes PA700 (19S regulator) or PA28 (11S regulator) at its ends [3]. This paper 
deals with proteasomal digestion of proteins widely studied in molecular biology and 
immunology 

A protein enters the proteasome and is transported into the central chamber 
(this process is referred as the translocation one) where it is cleaved into fragments 
by one of the cleavage terminals arranged along two rings. Fragments of the protein 
produced are removed through proteasome gates. Some of these fragments, epitopes, 
are transported onto the cell surface where T-lymphocytes scan them in order to 
recognize the cells to be killed because of an abnormal functioning. Hence, the digestion 
pattern for a degraded protein and its statistical properties determine the reaction of 
the immune system to the presence of this protein in a certain cell. Peculiarities of 
the translocation rates can qualitatively affect the expression of the specific fragment, 
e.g., an epitope, because an altered transport changes time of being near the cleavage 
terminal, i.e., conditions of cleavage. Moreover, impairment of proteasomal degradation, 
probably due to transport malfunction, might contribute to the pathology of various 
neurodegenerative conditions [I]. 

The mechanism of protein translocation remains unknown (however, subjects 
related to some extent to this problem have been studied in [5] [6] [7] [8]). It is also 
unknown whether this mechanism is qualitatively different for different proteasome types 
(constitutive or immuno-), with/without different regulatory complexes. Recently, in [9] 
a stochastic model, which allows a straightforward reconstruction of the translocation 
rates and cleavage specificities from mass spectroscopy (MS) data on digestion patterns, 
has been introduced. These properties reconstructed can be used for a comprehensive 
quantitative prediction of proteasomal digestion patterns for new proteins and new 
experimental set-ups. In this paper we elaborate the mathematical theories for the 
employing of the introduced model for relatively short synthetic polypeptides (section [2]), 
long proteins with a periodic sequence of amino acids (section [3]), and long natural 
proteins which require a peculiar approach (section HJ). 

1. Physical model of the system and mathematical description 

We describe the process of protein transport and degradation by the proteasome (see 
figure [I]) within the framework of the following assumptions. 

• Protein translocation: The process of the infiltration of a protein into the 
proteasome chamber is a sequence of thermal noise induced jumps of the protein strand 
by one amino acid (AA). In figured], the zoom-in of the chamber gate schematically 
shows the diameter of the gate to be comparable with the characteristic size of an AA, 
what means that the protein chain may be fixed in metastable states by a tight gate 
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initial M-mer protein 




Figure 1. Infiltration of a protein strand into the 20S proteasome: The scissors 
mark the positions of active sites rings at x = and x — L; the cleavage occurs via 
the attaching-detaching of the protein to active sites (dark-grey color). The zoom- in 
shows the protein fragment KEFNII passing through the gate; the electron shields are 
presented in pale colors. 



between successive jumps due to large thermal fluctuations. Indeed, the atomic force 
measurements reveals £/& / kT > 3 [10] , where £/& is the characteristic height of the energy 
barrier separating nearest metastable positions of the chain and kT/2 is the energy 
of thermal fluctuations. The probability of the protein shift by one AA during the 
infinitesimal time interval dt into the proteasome (to the right in figure [1]) is assumed to 
depend only on the length x of the protein forward end beyond the active sites nearest to 
the proteasome chamber gate used for protein infiltration (the left ones in figured]); this 
probability divided by dt is given by the translocation rate function (TRF) v(x) = v x . 
In such a way, we neglect the role of the AA sequence specificity for translocation, what 
is suggested by a non-covalent interaction between the proteasome and the retracted 
protein. The backward motions of the entering strand are neglected as well (from [10J, 
for the potential energy U(x) of the metastable state x, (U(x—l) — U(x+l))/2kT « 2.5, 
thus, meaning the probability of a backward motion to be diminished by factor e -2 ' 5 
against the forward one). These assumptions do not impose significant restrictions on 
the physical mechanism of the translocation process: they are valid for the thermal drift 
in a tilted spatially-periodic potential (e.g., see (TTJ) as well as for the ratchet effect (e.g., 
see [8]), etc. The TRFs of different proteasome species (20S, 26S which is the association 
of 20S core and 19S regulatory complexes, etc. [3]) differ. 
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• Cleavage: When the protein strand is close to the active site, the probability of 
cleavage during the infinitesimal time interval dt depends on the sequence of AAs nearest 
to the peptide bond cleaved [12] . For the given protein, this conditional probability 
divided by dt, in other words, conditional cleavage rate (CCR), 7(7-) = j T , is a function 
of the bond number r (precisely, r numerates the position of the bond within the initial 
protein and is counted from the end which has first entered the proteasome; see figured]). 
In the following we use the number r of the bong nearest to the first ring of active sites 
as a time-like variable. 

• Removal of digestion products: The cleaved parts of the protein degraded, 
peptides, leave the chamber through the second proteasome gate. Due to their mobility 
being higher in comparison to that of the protein, processed peptides leave the chamber 
quick enough to neglect both their possible further splitting and their influence on the 
protein translocation. 

Let us now introduce the distribution w(x\t) which is the probability of the protein 
forward end beyond the first ring of the active sites to be of the length x, when the 
rth bond is near that ring, in terms we use henceforth, at the discrete "time moment" 
r. We measure x in AA. Note, x and r are integer. In the following we describe the 
"temporal" evolution of distribution w(x\t). On this way, we treat the shift of the 
protein strand into the proteasome for one AA, i.e., the transition r — > r + 1. Let us 
decompose w(x\r + 1) as 



where Wj(x\r + 1) are the contributions due to different scenarios of this transition. 
Along with w(x\t), we account Q(n,m\r), the amount of the peptide (n, m), which is 
the m-n subsequence of the degraded protein (see figure [Tj), generated during transition 
r — ► r + 1. 

In the process of protein digestion there are three possible elementary events: 

(a) the strand shift: x — > x + 1, r— > t + 1; the event rate is v(x); 

(b) the cleavage on the first ring of cleavage centers (x = 0): x — > 0, r — > r; the event 
rate is 7(7"); 

(c) the cleavage on the second ring of cleavage centers (x = L, L is the distance between 
the rings of cleavage centers): x —>■ L, r — > r; the event rate is 7(7" — L). 

In terms of these elementary events the possible scenarios of transition are 
(1) Elementary event (a). Its probability is 



w(x\t + 1) = J2j w i( x \ T + 1) 




v x / (v x + 7 T + 7t-l), x > L . 



In this scenario, x 



x + 1, and 



W\{x + l|r + 1) = Pi(x\t) w(x\t) . 



(1) 



No peptides are generated; 



B?.(x\t) 



P,(x\t) 
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(2) Elementary event (b), which may not be followed by anything but the strand shift 
by one AA (as there is nothing to be cleaved). This scenario probability is 

7t/(«* + 7t)» x<L; 
1t/{v x + 1t + 1t-l), x>L. 
In this scenario, x — > 1, and 

w 2 (x\t + 1) = 5 X>1 E"=i P2(x'\t) w(x'\r) . (2) 
The peptides cut out are 

Qi{t,t — x + l|r) = P 2 {x\t)w{x\t) ; (3) 

(3) Elementary event (c), which may be followed either by strand shift (1) or by scenario (2). 

The probability of the first stage (c) is 

0, x<L- 

1t-l/{v x + 1t + 1t-l), x>L. 
After event (c), when x — > L, the number of the system states generated is 

w c (x\t) = 5 X)L J2™=l+i Pc{x'\t) w(x'\t) , 

and the peptides cut out are 

Q c (t — L, r — x + 1|t) = P c (x\t) w(x\t) . 

The subsequent events (1) or (2) should be considered as the respective above mentioned 
scenarios starting with the distribution w c (x\t), i.e., 

oo 

w c1 (x\t + 1) = P 1 (L\t) w c (x - l|r) = P x {L\t) S x>l+1 £ P c (x'\r) w(x'\r) , (4) 

x'=L+l 

Q c1 (t-L,t-x + 1\t) = P 1 {L\t)Q c {t-L,t-x+1\t) = P 1 (L\r) P c (x\r) w(x\r) ; (5) 

oo oo 

w c2 (x\r + l) = P 2 (x'\t)w c (x'\t) = 6 xA P 2 (L\t) ^ P c ( x'\t)w(x'\t) , (6) 

x'=l x'=L+l 

Q c2 (t-L,t-x+1\t) = P 2 (L\t)Q c (t-L,t-x + 1\t) = P 2 (L\r) P c (x\t)w(x\t) , (7) 

oo 

Q c2 (t,t-x + 1\t) =P 2 (x\t)w c (x\t) = 8 x , l P 2 (L\t) Pc(x'\t) w(x'\t) . (8) 

x'=L+l 

Collecting equations (JTJ) , (J2J) , (SJ) , (JHJ) , one finds the master equation 

W (l|r + l) = V 7 - W(x|T) + (l + J£±-) f 7 - W(X ' T) ; (9) 



w(L + l|r + 1) = ^— w(L\r) + V ^- L w(x\r) 

x=L+l 

w(x\r + l) = ^'^TVu for + (11) 

v x -i + 7r + 0(x-L-l)7 T _ L 



Towards prediction of proteasomal digestion patterns of proteins 



6 



Here x = 1, 2, 3, M and r = 1, 2, 3, M — 1, where M is the length of the protein, 
and the Heaviside function Q(x < 0) = 0, Q(x > 0) = 1. Equations (l9])- f[TTl) form a 
linear map 

w(x\r+l) = ^ =1 C xy (T)w(y\r). (12) 
The whole contribution to the cleavage pattern 

Q(t, t — x+ l|r) = Q 2 (t, t — x + 1\t) + Q&ir, r — x + l|r) 

= 7rw(a:|r) ^ 7r 7 T _ L wfo'l-r) 

^ + 7r + 6(X-L-1)7 T _ L V L + 7 r x ,f^ 1 fx' + 7r + 7r-L ' 

Q(t— L, t — L— x + 1|t) = Q c i(t — L, r — L — x+l\r) + Q c2 (t — L, t — L— x+1\t) 
= 1t-lw(L + x\t) 

VL+x + 7r + lr-L ' 

All the rest [not specified by expressions (fl3l) . (114p ] elements Q(m,n\r) are zero. The 
expressions for digestion pattern Q(m, n) after the processing of a single protein molecule 
are different for short polypeptides and long ones of a periodic AA sequence. 

2. Short (25—50 AA) synthetic polypeptides 

First we consider degradation of short (25-50 AA) synthetic polypeptide (protein), the 
most common situation for in vitro experiments. Here we start at r = 1 with w(x\t = 
1) = $x,i and iterate linear map ([12"]) till the last r — M —1. For a short polypeptide the 
releasing of the last fragment from the chamber at the "time moment" r = M should be 
additionally taken into account: Q(M, M - x + l\M) -> Q(M, M - x + l\M) + w(x\M). 
Hence, with w{x\r) known for r = 1,2, ...,M, one may evaluate digestion pattern 
Q{m,n) from $F5§ and (Til) , 

Q(ti,t 2 ) = Q(r 1 ,r 2 \T 1 ) + Q(M-r 1 -L)Q(T 1 ,T 2 \r 1 + L) 

7 n w(n - t 2 + l|ri) 



dr.^win + L - t 2 + 1\M) 



-1 + 7n + @(n-^ r 2-^)7r 1 -L 



(5 Tl _ T2+ i i i7 n 7n-L w(a;|ri) 



+ n - 7 - 2+ l,L7r 1 



f + 7n x ^ +l V x + 7n + 7n-L 

+ e(M-r 1 -L) 7 ^ W(n + L ~ T2 + 1|n + L) , (15) 

fn+L-r 2 +l + 7n+L + 7n 

here 1 < r 2 < ri < M. Since the protein may be cleaved starting both from the C- and 
from the N-terminal, the final digestion pattern is given by 

Q fin (ri, r 2 ) = P N Q N (n, r 2 ) + P c Q C {M - r 2 + 1, M - n + 1) . (16) 

The subscripts indicate which terminal goes first, Pn and Pc = 1 — Pn are the 
probabilities of the degradation starting from the corresponding end. Generally, v-^{x) 
and vc{x) may be slightly different, but here we neglect this difference. Note that 



Towards prediction of proteasomal digestion patterns of proteins 



oj 0.3h 



(0 " 

o 

■d 0.1 - 
c 
o 
u 
0.0 L 



TESPSF 



a) 



SAG D N P P V L 



FS 



S D FRI S G APE 



K Y 



ESER 



RA 



GDNPPAGD 



N 



] original(test) 
I reconstructed 



LJ 



10 15 20 25 

peptide bond # 



30 



35 



40 



COOH 



s 1 - 

0) 0.8 
-*— * 

(0 

c0.6 
g 

13 0.4 

u 

o 

M0.2 
c 

2 

*-o.o 



b) 




] original(test) 
I reconstructed 



10 15 20 25 

length x(aa) 



30 



35 



40 



i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



40 

o 30 

1)20 
£5 

- 10 

*- 

= 10" 1 

o 

«10" 2 

ra 
P 

"10" 4 



nlH IH 



41 



Iff w 



— -1 



I T II T III T IIII1IIIII T IIIIII1IIIIIII1IIIIIIII1IIIIIIIII1IIIIIIIIII1IIIIIIIIIII 






c) 



Figure 2. Test — Reconstruction of translocation rate function u(a;) and conditional 
cleavage rates 7(7-) for the 44mer peptide Kloe 316 [HI [15] [but with roughly estimated 
authentic (original) values of 7(7")], which is the subsequence 543-586 AA of human 
myelin associated glycoprotein, a) the conditional cleavage rates and the AA sequence; 
b) the translocation rate function; c) the upper plot presents the set of digestion 
fragments (black bars: fragments utilized for the reconstruction, grey bars: not 
utilized), and the lower plot presents the amount of the corresponding fragment 
(diamonds: the reconstructed values Qg n , grey bars: the values of Q utilized for the 
reconstruction). 



a fragment length distribution S(x) (often used in the literature [H]) is then the 
convolution 

S(x) = J2tL x Q(r,T-x+l). (17) 

Digestion pattern Q^iji, r 2 ) is a functional of TRF v(x) and CCR j(t). Utilizing 
MS data on the digestion pattern, one can determine nonzero values of 7(1") (i.e. 
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positions of possible cleavage) and minimize the mismatch between Qfi n (r 1 ,r 2 ) and MS 
data Q(ti, r 2 ) over v(x), the nonzero values of 7(1"), and Pn i n order to reconstruct them. 
Expecting the function v(x) to be smooth, we parameterize appropriate approximate 
functions as 

-Al(y/A^-\A t \) 



\1 4 2 



v app (x) = v e 11 . (18) 

Note, v(x) and 7(7-) are defined up to a constant multiplier, which should be determined 
from the degradation rate in real time, but not from the digestion pattern. 

In order to verify the robustness of the reconstruction procedure, numerous tests 
have been performed. A typical test presented in figure [2] has been performed in 4 steps: 

(1) For given v(x) [not generic for f app , i.e., the used function v(x) cannot be perfectly 
fitted with expression (1181) ] and 7(7-) digestion pattern Q (71,72) has been evaluated. 

(2) The result has been perturbed by the noise, Q TlT2 = Q T1T2 + 10 _4 P TliT2 ^/Q Tl ,r 2 , where 
R T1 ,T 2 are independent random numbers uniformly distributed in [—1,1]. 

(3) We have omitted the information about fragments, which relative amount is less than 
5 • 10~ 3 , and lmer and 2mer fragments as being hardly detectable in experiments (one 
cannot distinguish identical AAs cut out from different parts of the polypeptide [16]). 

(4) Resulting Q TlT2 has been used for the reconstruction of v(x) and 7(1"). 

The original and reconstructed data for 7(1") (figure (2^,) and v(x) (figure [2b) are in a 
very good agreement. The reconstructed Pn = 0.52 against original Pn = 0.50 . 

Unfortunately, the data available in the literature are mainly too much incomplete 
(a lot of fragments are not accounted) and not enough precise for a truly reliable 
reconstruction [9] (the initial solutions used for experiments quite frequently contain 
not only the polypeptide to be digested but also a certain amount of its fragments, 
the first measurement of the proportions of the solution is performed to late, when 
considerably more than 5% of the initial substrate has been degraded and one may not 
neglect reentries of the digestion fragments into the proteasome, etc.). 

Thus, we should note the limitations of the suggested reconstruction method: 

• The reconstruction procedure for short polypeptides is very sensitive to measurement 
inaccuracy. 

• For some polypeptides the procedure fails. This may happen due to a specific 
arrangement of cleavage positions, when different TRFs v(x) provides almost identical 
digestion patterns. 

• Though the whole information on Q(ti,t 2 ) is not needed, the number of nonzero 
values of Q(t±, r 2 ) required for a reliable (tolerant to noise) reconstruction is at least the 
twice number of reconstructed parameters, i.e. 2x ([number of positions of potential cleavage] + 
[number of parameters of v app ] + 1). For instance, for Kloe 258 in p] the number of 
trustworthy and utilized values of Q(ti, r 2 ) is 19 instead of the required 2 x (10 + 3 + 1) = 
28, it is a bit greater than the number of the unknown parameters, i.e., 14. Hence, more 
accurate and comprehensive MS data on the digestion pattern are required. 

• For short polypeptides the finishing stage of the degradation is relatively important, 
while in this stage the translocation rate is affected by the edge effects (the backward 
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end of the polypeptide gets inside the proteasome chamber) and is not the same as for 
the remainder of the polypeptide. 



3. Long synthetic polypeptides of a periodic amino acid sequence 

While a more comprehensive acquisition of data on digestion fragments and enhance- 
ment of experimental techniques for short polypeptides are up to experimentalists we 
propose experimental set-up which allows overcoming all the limitations mentioned 
above and is expected to be realizable. For this a long synthetic polypeptide with a 
T-periodic AA sequence: j(t) = j(t + T) should be digested. Here "long" means one 
may neglect the peculiarities of the starting and finishing stages of the degradation, and 
M > T. 

For the given direction of the degradation, e.g., starting with the N-terminal, we 
are looking for the establishing T-periodic in r solution wi$,t{x\t) = Wn } t(x\t — T) 
to equation (fl2l) . The fragment (n,m) is identical to the one (n + kT,m + kT), where 
k is integer; therefore Q]<s(m,n) may be chosen to make contribution to Q^(m — n + 
(n mod T), n mod T). The amount of fragments grows almost linearly with "time" r 
as the polypeptide being processed. Hence, for the digestion pattern one finds 

1 T 1 T 

Qn,t(i~i,t 2 ) = lim -)Qt$(ti,t 2 \t') = — )Qx,t(ti,t 2 \t') 

■ =1 r'=l 



1 

T 



7ti wn,t(ti -T 2 + l\r 1 ) 
v Tl - T2+1 + 7 n + 0(ti-t 2 -L)7- 



ti-L 



^ T1 - T2+ i,L7r 1 y> 1t 1 -lWn,t( x \ t i) + 7n %,r(n + L ~ t 2 + 1 \n + L) 

V L + 7n x ^ +1 V x + 7n + 7n-L V T1+L _ T2+1 + J T1+L + J T1 



(19) 



(here 1 < r 2 < T and T\ > r 2 ). 

To treat the degradation process starting with the C-terminal, one has (i) to 
perform the transformation j(t) — > 7(T — r), (ii) iterate linear map ([T2]) with 
the new j(t) like for the N-case, but assuming Qc{m, n\r) to make contribution to 
Qc{m mod T, n — m + (m mod T)). Unlike ( Tl6|) . the final result is 

Qfin(ri,r 2 ) = PnQn,t(ti,t 2 ) + P c Qc,t(T-t 2 , T—n). 

Matching Qfi n (m, n) to the MS data one can reconstruct v(r), 7(t), and Pn- For a 
test we have made use of the cleavage map of the digestion of yeast enolise-1 by human 
erythrocyte proteasome [17]. Looking at its subsequence 331-348 AA 

... | ATAIEKKA | AD | ALLL | KV | NQ | . . . -COOH 

(vertical stripes mark the positions of experimentally observed cleavages), one may 
expect the case, where the underlined subsequence is followed not by KV, but by KKA..., 
and the periodic sequence is 

AD | ALLL | KKA | . . . |AD|ALLL|KKA| . . . |AD|ALLL|KKA-COOH , 
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Figure 3. Test — Reconstruction of translocation rate function v(x) and conditional 
cleavage rates 7(7") for a 9-periodic polypeptide with the cleavage positions 2, 6, 9. 
(For description see caption to figure [2]) 

to be realizable. For such a sequence a test like the one in figure [2] (but with much 
stronger dithering: Q TlT2 = Q T1 r 2 + 2 • 10 _3 i? TliT2 ^Qn,^) is presented in figure [3j Due 
to the small number of unknown parameters the reconstruction procedure is rather 
tolerant to measurement inaccuracy and does not require information on a large number 
of digestion fragments (the most easily detectable fragments are enough). 

4. Long natural proteins 

The case of a most immediate interest is the digestion of long natural proteins 
(over about 300 AA) because it concerns the in vivo proteasomal activity. A direct 
implementation of the procedure developed for short polypeptides is hardly possible 
here, as in the course of matching Q{ti,T2) to the MS data, one has to perform a 
minimization over an enormous number of parameters. However, for long non-periodic 
proteins, one may assume j(t) to be a random process in order to evaluate some 
observable statistical properties like the fragment length distribution (FLD) of the 
digestion products, i.e. S(x) [see equation ( fl7l) ]. 

For this random process we adopt the following: 
• the neighbor values j(t) and j(t + 1) are mutually independent (what does not 
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necessarily mean that CCR 7(7") is independent of neighbor AAs); 

• 7(7") is zero with a certain probability q, and has a finite probability density g(j) 

otherwise. 

The normalized mean FLD S(x) = (S(x))/ Yl™=i(S( x ')) ma y be evaluated either 
via the plain iterating of (|T2l) - (fl6ll with noise 7(7") over a large interval of r or via the 
direct simulation of the system with a Gillespie algorithm (e.g., see [18]) . However, 
the calculation procedure may be considerably facilitated. For this purpose, let us 
average ( fl2l) over realizations of 7(1"), 

(w(x\t + 1)) 7 = E7=i(^y( T ) ^l r )>7 • (20) 

Noteworthy, w(x\r) depends on 7(r— 1) and the preceding values of 7 but is independent 
of 7(7"). Moreover, the impact of preceding values of 7 decays in the course of the 
processing of the protein, and one may neglect the correlation between w(x\t) and 
7(7" — L) which are mutually distant in r. Thus, w(x\r) is independent of 7(7") and 
j(t — L), which are involved in C xy (r), and (|20|) yields 

(w(x\r + i)} 7 » E7=i(^v( T )hrj T - L Hy\r)h; (21) 
from (PD, (ED, (ED, 

(5(x|r + 1)> 7 = (5(x|r)> 7 + / - , 7r - \ («;(rr|r)> 7 

\^ + 7r + e(a;-L-l)7 T _ L / 7 



+ < — ) (w(L + x\r)), 



7t7t-l 



+ ^ E (H: > W*V)>t> ( 22 ) 



where 



(/(7i, 72)) 7172 = q 2 f(0, 0) + q(l - g) / °° 0(7) [f(0, 7) + /(7, 0)]d 7 

+ (1 - <?) 2 / °° d 7i JiT d 72 ^(71) 9(12) f(jh 72) • 

The FLD observed in experiments is S(x) corresponding to the establishing steady 
solution (w(x\oo)) to linear map (l2Tj) . 

Noteworthy, with the additional approximation 

(£>x V (1t,1t-l)) 1t1t _ l « ^((7), (7)) , 
one may obtain an implicit recursive formula for establishing (w(x\t)) from (T2~Tj) . 

+ l|oo)> = ( 1 +^)^ ( W (x|oo)) , (23) 

v x + (l + B(x - L))(7> 

and find FLD <S(z) from 



;1 + ^,l)(7) (w(x|oo)) (7) (w(L + o;|oo)> 



= ^ + (l + 9(x-L))(7) ^±£ + %) . (24) 
M l|oo)) + M^±iM) 
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Figure 4. Samples of fragment length distribution S(x) (FLD) for the degradation 
of a long natural protein under the assumption, that conditional cleavage rate j(t) 
may be treated as a random process. The fraction q of nonscissile peptide bonds is 
indicated in the plots, nonzero values of 7(7") are uniformly distributed in [0,7 max ], 
L = 9, the adopted translocation rate function v(x) is plotted in the right plot. 
In two left plots, bars: results of the direct simulation with a Gillespie algorithm, 
squares: the approximation (f!2Tj) , (J^U) , circles: the approximation , with 

<7) = (1 - <7)7max/2. 



Remarkably, in the quasi-continuous limit (which is valid when v(x) is a "slow" function 
of x), the last expressions provide (cf. [IB] ) 

_ 5 (i+e( g -.L)) < 7 > da ./ 

(tw(o;|oo)) = (l + e(a;-L))(tw(0|oo))e ° o( " 3 , 

-j (1+e( r^ ))<7> dx' -T (l+e( r^ ))(7> d^ 

J tjix' ) •i v(x' ) 

e o 1 ; e 

+ 



5(x) = ( 7 > ^ " (L + X) 



_ f JfL>dx' 

1 + e o ^ ) 

In figure HI one may see, that the both above mentioned approximations become 
more accurate as q decreases. However, for realistic value q ~ 3/4 which is suggested 
by experimental cleavage maps (see figure El where the sites of a potential cleavage 
are taken from experimental data), the approximation (12TT) . ( |22l) works considerably 
better than the one (123]) . (12~4"1) . Remarkably, as q increases with (7) kept fixed, the local 
maximum near x = L shifts from x = L to higher values of fragment length x and the 
cutting-out of longer peptides becomes more probable. The existence of this maximum 
at L w 8 — 10 AA deserves especial attention because the epitopes, involved in the 
functioning of the immune system and bound to MHC I molecules, have exactly such 
length [19]. 

The important limitation of this method is related to the reconstruction of v(x) 
for lmer and 2mer peptides. These peptides are hardly detectable in experiments 
and, therefore, experimental S(x) is not determined for x = 1, 2, and one cannot 
reconstruct the respective values of v(x). Note, for methods suggested in sections [2] and [3] 
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this limitation does not occur because, e.g., for the subsequence I F I S I SDFRISGAPE I 
in figure the information on v (1) is reflected in the difference between the readily 
measurable amounts of generated peptides I S I SDF ... I and I SDF . . . I , while for long 
natural proteins we lose the individual information on each specific peptide cut out. 

5. Conclusion 

In this paper we have discussed a model of the degradation of proteins by the proteasome 
which allows one to reconstruct the proteasomal translocation function and the cleavage 
specificity inherent to the amino acid sequence and not affected by proteasomal transport 
properties. With these properties determined, one can comprehensively predict digestion 
patterns of new proteins. The model is relevant for a broad variety of hypothetically 
possible translocation mechanisms [EJ [TT]. We have mathematically elaborated this 
model for the cases of (i) relatively short (25-50mers) synthetic polypeptides as the 
most common case for in vitro experiments, (ii) long periodic polypeptides (proposed 
experiments with such polypeptides are very promising for reverse engineering), and 
(iii) long natural proteins. 

In [18], we have already discussed how peculiarities of the translocation function 
may lead to the multimodality of the fragment length distribution even for j(t) = const. 
Here we have shown that the amount of each digestion fragment is not only determined 
by the cleavage map [specifically, conditional cleavage rate 7(r)] of the substrate but 
is also crucially affected by nonuniformity of the translocation rate. The results of 
implementation of the developed theory for processing experimental data on digestion 
patterns for different proteasome species under different conditions can give insight into 
the nature of the protein translocation mechanism inside the proteasome. They can as 
well elucidate the unanswered question whether there is some preference for starting the 
degradation with the N- or C-terminal of the protein, and how this preference is affected 
by regulatory complexes. Hopefully, theoretical results will stimulate new experiments 
as suggested in this paper for the case of a periodic polypeptide. 
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